|
$webwork.htmlEncode($page.space.name) : Clustering and Caching GeoServer
This page last changed on Feb 23, 2007 by cholmes.
With a small budget to work with, I decided to go with a three machine "cluster" of Geoserver boxes. First, I built one machine (dual xeon 2.8ghz with 1.5GB ram) and configured it to run geoserver.
Squid listens on port 80, forwarding through to port 8080 for things not in its cache (this is called "http accelerator mode" in squid) Squid 'accelerator mode' FAQ/HOWTO. In addition, squid is configured to broadcast out to its peers to find things not in its cache...in case another nearby server has seen the requested content. Note: This last part (about linking the squids as peers) was very hard. I still don't have it right. It involves the "cache_peer" squid directive, along with a bunch of other acl-ish direcives allowing cache access from the specific peers. Sometimes I get really fast access to previously generated images. Sometimes I don't. At some point I'll probably figure out what's going on and post a solution here. I then cloned the hard disk two more times using g4u - Ghost 4 Unix (a.k.a. "slurpdisk") and a pair of IDE cables, and put the two "new" hard drives into the other computers. Then I changed their respective ip addresses and hostnames, and hooked them all up to a gigabit switch. So now a request for a map to one of the three computers would go something like this:
Note: as mentioned above, steps 2 &3 are mostly broken for me. I may fix them later. Great! Except I have three ip addresses and no way to distribute load across the three servers. So I found an old P-III 733, and installed debian 3.1 on it. I also installed "balance" Balance is a TCP load-balancer, and it lets us expose one external ip address as an end-point for all the machines, and then round-robin incoming requests through to the three different back-end servers. So now we have one public ip address, which forwards requests at the TCP level (round-robin style) through to the back-end "cluster" of identically configured machines. The only remaining problem was how to maintain a consistent configuration across the different machines. The very day that I faced this problem geoserver released "GEOSERVER_DATA_DIR" support. Sweet. So I set up samba on the P-III 733 load-balancer machine, and set up the three "workhorse" machines to mount the samba shared directory. I then set up the workhorses to use the shared directory as their geoserver_data_dir (and actually linked their geoserver.war file onto that share, too...so depolyment of a new version of geoserver is simply copy-to-samba-share -> restart all three machines). I then set up ssh and keychain on those machines, and wrote a script which performs some basic admin tasks like "start, stop, reload-config" on all machines at once, from a central place. This script is available as an attachment to this page. NoteNote that this work was ground breaking towards getting GeoServer working with caching, but there are some improved solutions of late. See the TileCache Tutorial for a really nice way to get WMS caching. These notes on load balancing and cluster are still relevant though.
|
| Document generated by Confluence on Jan 16, 2008 23:28 |